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Abstract 


tmulators  that  translate  algorithms  from  the  shared-memory  model  to  two  different  message¬ 
passing  models  are  presented.  Both  are  achieved  by  implementing  a  wait-free,  atomic,  single- 
writer  multi-reader  register  in  unreliable,  asynchronous  netv/orks.  The  two  message-paissing 
models  considered  are  a  complete  network  with  processor  failures  and  an  arbitrary  network 
with  dynamic  link  failures. 

These  results  make  it  possible  to  view  the  shared-memory  model  as  a  higher-level  lan¬ 
guage  for  designing  algorithms  in  asynchronous  distributed  systems.  Any  wait-free  algorithm 
based  on  atomic,  single-writer  multi-reader  registers  can  be  automatic<illy  emulated  in  message¬ 
passing  systems.  The  overhead  introduced  by  these  emulations  is  polynonaial  in  the  number  of 
processors  in  the  systems. 

Immediate  new  results  are  obtained  by  applying  the  emulators  to  known  shared-memory 
algorithms.  These  include,  among  others,  protocols  to  solve  the  following  problems  in  the 
message-passing  model  in  the  presence  of  processor  or  link  failures:  multi-writer  multi-reader 
registers,  concurrent  time  stamp  systems,  f-exclusion,  atomic  snapshots,  randomized  consen¬ 
sus,  and  implementation  of  a  class  of  data  structures. 

Keywords:  Message  passing,  shared  memory,  dynamic  networks,  fault  tolerance,  wait-free 
algorithms,  emulations,  atomic  registers. 


1  Introduction 


Two  major  interprocessor  communication  models  in  distributed  systems  have  attracted  much 
attention  and  study:  the  shared-memory  model  and  the  messo  je-passing  model.  In  the  shared- 
memory  model,  n  processors  communicate  by  writing  and  reading  to  chared  atomic  registers.  In 
the  message- passing  model,  n  processors  are  located  at  the  nodes  of  a  network  and  communicate 
by  sending  messages  over  communication  links. 

In  both  models  we  consider  asynchronous  unreliable  systems  in  which  failures  may  occur. 
In  the  shared-memory  model,  processors  may  fail  by  stopping  (and  a  slow  process  cannot  be 
distinguished  from  a  failed  processor).  In  the  message-passing  model  failures  may  occur  in 
either  of  two  ways.  In  the  complete  network  model,  processors  may  fail  by  stopping  (without 
being  detected).  In  the  arbitrary  network  model,  links  fail  and  recover  dynamicrilly,  possibly 
disconnecting  the  network  for  some  periods. 

The  design  of  fault- tolerant  (or  wait-free)  algorithms  in  either  of  these  models  is  a  delicate 
and  error-prone  task.  However,  this  task  is  somewhat  easier  in  shared-memory  systems,  where 
processors  enjoy  a  more  global  view  of  the  system.  A  shared  register  guarantees  that  once 
a  processor  reads  a  particular  value,  then,  unless  the  value  of  this  register  is  changed  by  a 
write,  every  future  read  of  this  register  by  any  other  processor  will  obtain  the  sam,e  value. 
Furthermore,  the  value  of  a  shared  register  is  always  available,  regardless  of  processor  slow¬ 
down  or  failure.  These  properties  permit  us  to  ignore  issues  that  must  be  addressed  in  message¬ 
passing  systems.  For  example,  there  are  discrepancies  in  the  local  views  of  different  processors 
that  are  not  necessarily  determined  by  the  relative  order  at  which  processors  execute  their 
operations. 

An  interesting  example  is  provided  by  the  problem  of  achieving  randomized  consensas. 
Several  solutions  for  this  problem  exist  in  the  message-passing  model,  e.g.,  [16,  19,  25],  and  in 
the  shared-memory  model,  e.g..  [18,  1,  9,  12).  However,  the  algorithm  of  [9]  is  the  first  to  have 
polynomial  expected  running  time  and  still  overcome  an  “omnipotent”  adversary — one  that  has 
access  tc  the  outcomes  of  local  coin-flips.  The  difficulty  of  overcoming  messages’  asynchrony- 
in  the  message-passing  model  made  it  hard  to  come  up  with  algorithms  that  tolerate  such 
omnipotent  adversary  with  polynomial  expected  running  time.’ 

This  paper  presents  emulators  of  shared- memory  systems  in  message-passing  systems  (net¬ 
works),  in  the  presence  of  processor  or  link  failures.  Any  wait-free  algorithm  in  the  shared- 
memory  model  that  is  brised  on  atomic,  single-writer  multi-reader  registers  can  be  emulated  in 

’The  synchronous  message-passing  algorithm  of  [26]  is  resilient  to  Byzantine  faul's,  but  retjuircs  private 
communication  links  and  thus  is  not  resilient  to  an  omnipotent  adversary. 
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both  message-parsing  mcaeis.  The  overhead  for  the  emulations  is  polynomial  in  the  number 
of  processors.  The  complexity  measures  considered  are  the  number  of  messages  and  their  size, 
the  time  and  the  local  memory  size  for  each  read  or  write  operation. 

Thus,  shared-memory  systems  may  serve  <is  a  “laboratory”  for  designing  resilient  algo¬ 
rithms.  Once  a  problem  is  solved  in  the  shared-memory  model,  it  is  automatically  solved  in 
the  message-passing  model,  and  only  optimization  issues  remain  to  be  addressed. 

Among  the  immediate  new  results  obtained  by  applying  the  emulators  to  existing  shared- 
memory  algorithms,  are  network  protocols  that  solve  the  following  problems  in  the  presence 
of  processor  or  link  failures; 

•  Atomic,  multi-writer  multi-reader  registers  ([36,  34]). 

•  Concurrent  time-stamp  systems  ([31,  2-1]). 

•  Variants  of  £-exclusion  ([22,  17,  4]). 

•  Atomic  snapshot  scan  ([2,  7,  8]). 

•  Randomized  consensus  ([9,  12]^.^ 

•  Implementation  of  a  class  of  data  structures  ([10]). 

First  we  introduce  the  basic  communication  primitive  which  is  used  in  our  algorithms.  We 
then  present  an  unbounded  emulator  for  the  complete  network  in  the  preseiice  of  processor 
failures.  This  implementation  exposes  some  of  the  basic  ideas  underlying  our  constructions. 
Moreover,  part  of  the  correctness  proof  for  this  emulator  can  be  carried  over  to  the  other 
models.  Wo  then  describe  the  modifications  needed  in  order  to  obtain  the  bounded  emulator 
for  the  complete  network  in  the  presence  of  processor  failures.  Finally,  we  modify  this  emulator 
to  work  in  an  arbitrary  network  in  the  presence  oi  link  failures.  We  present  two  ways  to  do  so. 
The  first  modification  is  based  on  replacing  each  physical  link  of  the  complete  network  with  a 
“vin  'al  viable  link”  using  an  end-to-end  protocol  ([5,  14,  C]).  The  second  modification  results 
jn  a  ..  >'  !  efficient  emulation.  It  is  based  on  implementing  our  commuiiicatiun  primitive  as  a 
diffusing  computation  using  the  resynchromzation  technique  of  [6]. 

We  consider  systems  that  are  completely  asynchronous  sir  e  this  enables  us  to  isolate  the 
study  from  any  model-dependent  synchronization  a.ssumptions.  Although  many  “real”  shared- 
memory  systems  are  al  least  partially  synchronous,  asynchrony  allows  us  to  provide  an  abstract 
treatment  of  systems  in  which  different  processors  have  different  priorities. 

^Thi»  result  also  follows  from  the  transformation  of  [15). 
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We  believe  that  bounded  solutions  are  important,  although  in  reality,  20  bits  counters 
will  not  wrap  around  and  thus  will  suffice  for  all  practical  purposes.  The  reason  is  because 
bounded  solutions  are  much  more  resilient--  traditional  protocols  fall  if  an  error  occurs  and 
cause  counters  to  grow  without  limit,  An  algorithm  designed  to  handle  bounded  counters  will 
be  able  to  recover  from  such  a  situation  and  resume  normal  operation. 

Wait-free  protocols  in  sh^ed-memory  systems  enable  a  processor  to  complete  any  operation 
regardless  of  the  speed  of  other  processors.  In  message- passing  systems,  it  can  be  shown, 
following  the  proof  in  [11],  that  for  many  problems  requiring  global  coordination,  there  is  no 
solution  that  can  prevail  over  a  “strong”  adversary-  an  adversary  that  can  stop  a  majority 
of  the  processors  or  disconnect  large  portions  of  the  network.  Such  an  adversary  can  cause 
two  groups  of  fewer  than  majority  of  the  processors  to  operate  separately  by  suspending  all 
the  messages  from  one  group  to  the  other.  For  many  global  coordination  problems  this  leads 
to  contradicting  and  inconsistent  operations  by  the  two  groups.  As  mentioned  in  [11],  similar 
arguments  show  that  processors  cannot  halt  after  deciding.  Thus,  in  our  emulators  a  processor 
which  is  disconnected  (permanently)  from  a  majority  of  the  processors  is  considered  faulty  and 
is  blocked.^  Our  solutions  do  not  depend  on  connection  with  a  specific  majority  at  any  time. 
Moreover,  it  might  be  that  at  no  time  there  exists  a  full  connection  to  any  party.  The  only 
condition  is  that  messages  will  eventually  reach  some  majority  which  will  acknowledge  them, 

Although  the  difRcult  construction  is  the  solution  in  the  complete  network  with  bounded 
size  messages,  the  unbounded  construction  is  not  straightforw'ard.  In  both  cases,  to  avoid 
problems  resulting  from  processors  having  old  values  we  attach  time-stamps  to  the  values 
written  by  the  writer.  In  the  unbounded  construction,  the  time-stamps  are  the  integer  numbers. 
In  the  bounded  construction,  we  use  a  nontrivial  method  to  let  the  writer  keep  track  of  old 
time-stamps  that  are  still  in  the  system.  This  allows  us  to  employ  a  bounded  sequential  time- 
stamp  system  ([31]). 

Some  of  the  previous  research  on  dynamic  networks  (e.g.,  [28,  3])  assumed  a  “grace  period” 
during  which  the  network  stabilizes  for  long  enough  time  in  order  to  guarantee  correctness. 
Our  results  do  not  rely  on  the  existence  of  such  a  period,  and  follow  the  approach  taken  in, 
e.g.,  [35,  5,  14,  6]. 

There  are  two  related  studies  on  the  relationships  between  shared-memory  and  message¬ 
passing  systems.  Bar-Noy  and  Dolev  ([15])  provide  translations  between  protocols  in  the 
shared-memory  and  the  message-passing  models.  These  translations  apply  only  to  protocols 
that  use  a  very  restricted  form  of  communication.  Chor  and  Moscovici  ([20])  present  a  liierarchy 
of  resiliency  for  problems  in  shared-memory  systems  and  complete  networks,  and  show  that 

^Such  a  processor  will  not  be  able  to  terminate  its  opei  \tion  but  will  never  produce  erroneous  results. 
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for  some  problems,  the  wait-free  shared- memory  model  is  not  equivalent  to  complete  network, 
where  up  to  half  of  the  processors  may  fail.  Their  result,  however,  assumes  that  processors 
halt  after  deciding. 

The  rest  of  this  paper  is  orgairized  as  follows.  In  Section  ‘2,  we  briefly  describe  the  various 
models  considered.  In  Section  3,  we  introduce  the  communication  primitive.  In  Section  4, 
we  present  an  unbounded  implementation  for  complete  network  in  the  presence  of  processor 
failures.  In  Section  5,  we  present  the  modifications  needed  in  order  to  obtain  the  bounded 
implementation  for  the  complete  network  in  the  presence  of  link  failures.  In  Section  6,  we 
modify  this  emulator  to  work  in  an  arbitrary  network  in  the  presence  of  link  fciilures.  We 
conclude,  in  Section  7,  with  a  discussion  of  the  results  and  some  directions  for  future  research. 


2  Preliminaries 

In  this  section  w-e  discuss  the  models  addressed  in  this  paper.  Our  definitions  follow  [32]  for 
shared-memory  systems,  [29]  for  complete  networks  with  processor  failures,  and  [14]  for  arbi¬ 
trary  networks  with  link  failures.  In  all  models  we  consider,  a  system  consists  of  n  independent 
and  asynchronous  proces.sors,  which  we  number  l,...,n. 

A  formal  definition  of  an  atomic  register  can  be  found  in  [32],  the  definition  presented  here 
is  an  equivalent  one  (see  [32,  Proposition  3])  which  is  simpler  to  use.  An  atomic,  single- writer 
multi-reader  register  is  an  abstract  data  structure.  Each  register  is  accessed  by  two  procedures, 
writeu,(i')  which  is  executed  only  by  some  specific  processor  u>,  called  the  writer,  and  readr(i') 
which  may  be  executed  by  any  processor  1  <  r  <  n,  called  a  reader.  It  is  assumed  the.t  the 
values  of  these  procedures  satisfy  the  following  two  propercies: 

1.  Every  read  operation  returns  either  the  last  value  written  or  a  value  that  is  written 
concurrently  with  this  read. 

2.  If  a  read  operation  7^2  started  after  a  read  operation  TZ\  has  finished,  then  the  value  TZt 
returns  cannot  be  older  than  the  value  returned  by  72i . 

In  message- passing  systems,  processors  are  located  at  the  nodes  of  a  network  and  commu¬ 
nicate  by  sending  messages  aiong  communication  links.  Communication  is  completely  asyn- 
chronons  and  messages  may  incur  an  unknown  delay.  At  each  atomic  step,  a  processor  may 
receive  some  set  of  messages  that  were  sent  to  it,  perform  some  local  computation  and  send 
some  messages. 
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In  the  complete  network  model  we  assume  that  the  network  formed  by  the  communication 
links  is  complete,  and  that  processors  might  be  faulty.  A  faulty  processor  simply  stops  operat¬ 
ing.  A  nonfaulty  processor  is  one  that  takes  an  infinite  number  of  steps,  and  all  of  its  messages 
are  delivered  after  a  finite  delay.  We  assume  that  at  most  processors  are  faulty  in  any 

execution  of  the  system. 

In  dynamic  networks  communication  links  might  become  non-viable.  A  link  is  non-viable, 
if,  starting  from  some  message  and  on,  it  will  not  deliver  any  further  messages  to  the  other 
end-point.  For  those  messages  the  delay  is  considered  to  be  infinite.  Otherwise,  the  link  is 
viable.  This  model  is  called  the  oc-delay  mode!  in  [5).  Afek  and  Gafni  ([5])  point  out  that  the 
standard  model  of  dynamic  message-passing  systems,  where  communication  links  alternate 
between  periods  of  operation  and  non-operation,  can  be  reduced  to  this  model.  A  processor 
that  is  permanently  disconnected  from  processors  or  more  is  considered  faulty.  We  assume 
there  are  processors  that  are  eventually  in  the  same  connected  component.  Thus,  at 

most  processors  are  faulty. 

The  complexity  measures  we  consider  are  the  following: 

1.  The  number  of  messages  sent  in  an  execution  of  a  write  or  read  operation, 

2.  the  size  of  the  messages, 

3.  the  time  it  takes  to  execute  a  write  or  read  operation,  under  the  assumption  that  any 
message  is  either  delivered  within  one  time  unit,  or  never  at  all  (cf.  [13]),  and 

4.  the  amount  of  the  overhead  local  memory  used  by  a  processor. 

For  all  these  measures,  we  are  interested  in  the  worst  case  complexity. 


3  Procedure  communicate 

In  this  section  we  present  the  basic  primitive  used  for  communication  in  our  algorithms,  called 
communicate.  This  primitive  operates  in  complete  networks.  It  enables  a  processor  to  send  a 
message  and  get  acknowledgements  (possibly  carrying  some  information)  from  a  majority  of 
the  processors. 

Because  of  possible  processors’  crash  failures,  a  processor  cannot  wait  for  acknowledgements 
from  all  the  other  processors  or  from  any  particular  processor.  However,  at  least  a  majority 
of  the  processors  will  not  crash  and  thus  a  processor  can  wait  to  get  acknowledgements  from 
them.  Notice  that  processors  want  to  communicate  with  any  majority  of  the  processors,  not 
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necessarily  the  same  majority  each  time.  A  processor  utilizes  the  primitive  to  broadcast  a 
message  (M)  to  all  the  processors  and  then  to  collect  a  corresponding  {ACK)  message  from  a 
majority  of  them.  In  some  cases,  information  vtill  be  added  to  the  {ACK)  messages. 

For  simplicity,  we  assume  that  each  edge  (i,  j)  is  coir  posed  of  two  distinct  “virtual’’  directed 
edges  (i.j)  and  (j,  t).  The  communication  on  (i,j)  is  independent  of  the  communication  on 
Oi  0- 

Procedure  communicate  u,ses  a  simple  ping-pong  mechanism.  This  mechanism  ensures  FIFO 
communication  on  each  directed  link  in  the  network,  and  guarantees  that  at  any  time  ordy  one 
message  is  in  transit  on  each  link.  Informally,  this  is  achieved  by  the  following  rule:  i  sends 
the  first  message  on  (i,  j)  and  then  i  and  j  alternate  turns  in  sending  further  messages  and 
acknowledgements  on  {i,i). 

More  precisely,  the  ping-pong  on  the  directed  is  managed  by  processor  i.  Pro¬ 
cessor  i  maintains  a  vector  turn  of  length  n,  witj;  '  '■y  for  each  processor  that  can  gel 

the  values  my  ot  his.  If  turn(j)  =  my  then  it  is  I's  .n,  n  (i,j)  and  only  then  i  may  send 
a  message  to  j.  If  turn{j)  =  his  then  either  i’s  message  is  in  transit,  j's  acknowledgement 
is  in  transit,  or  j  received  i's  message  and  has  not  replied  yet  (it  might  be  that  j  crashed). 
Initially,  turn{j)  =  my.  Hereafter,  we  assume  that  the  vector  turn  is  updated  automatically 
by  the  send  and  receive  operations.'*  For  simplicity,  a  processor  sends  each  message  also  to 
itself  and  responds  with  the  appropriate  acknowledgement. 

Procedure  communicate  gets  as  an  input  a  message  M  and  returns  as  an  output  a  vector 
info,  of  length  n.  The  jtli  entry  in  this  vector  contains  information  received  with  j's  ac¬ 
knowledgement  (or  1  if  no  acknowledgement  was  received  from  j).  To  control  the  sending 
of  messages  the  procedure  maintains  a  local  vector  status.  The  jth  entry  of  this  vector  may 
obtain  one  of  the  following  values;  notsent,  meaning  M  was  not  sent  to  j  (since  turn(j)  =  his); 
notack,  meaning  M  was  sent  but  not  yet  acknowledged  by  j\  ack,  meaning  M  was  acknowl¬ 
edged  by  j.  Additional  local  variables  in  procedure  communicate  are  the  vector  turn  ard  the 
integer  counter  #acks  which  counts  the  number  of  acknowledgements  received  so  far. 

The  pseudo-code  for  this  procedure  appears  in  Figure  1.  We  note  that  whenever  this 
procedure  is  employed  we  also  specify  its  companion  procedure,  ack,  which  specifies  the  infor¬ 
mation  sent  with  the  acknowledgement  for  each  message  and  the  local  computation  triggered 
by  receiving  a  particular  message. 

The  ping-pong  mechanism  guarantees  the  following  two  properties  of  the  communicate 
procedure.  First,  the  acknowledgements  stored  in  the  output  vector  info  were  indeed  sent  as 
acknowledgements  to  the  message  M .  i.e.,  at  least  processors  received  the  message  M. 

’  The  details  of  how  this  is  done  are  omitted  from  the  code. 
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Procedure  commijnicate((  A/),  in/o);  (*  for  processor  i  *) 

#acJt5  0, 

for  all  I  <  J  <  ri  do 

siaius{])  •=  noiscnt  ; 
tnfo{])  ;  =  1  ; 

for  all  1  <  j  <  n  s.t.  ium(j)  =  my  do 
send  (Af)  to  j  ; 

^tatus(j)  notack  ; 
repeat  until  i^acka  > 

upon  receiving  (m)  from  j: 

if  statns{j)  —  notseni  then 
(*  acknowledgement  of  an  old  message  *) 
send  (Af)  to  j  ; 
status{})  :=  notack’, 
else  if  aiat\is{j)  =  notock  then 
status(})  ;=  ack  ; 
info[})  :=  m  ; 
ikacks  :=  #orfca  +  1  ; 
end  procedure  communicate; 

Figure  1;  The  procedure  communicate. 


Second,  the  number  uf  mossriges  sent  during  each  execution  of  the  procedure  is  at  most  2n. 
Also,  it  is  not  hard  to  see  that  the  procedure  terminates  under  our  assumptions.  The  next 
lemma  summarizes  the  properties  and  the  complexity  of  procedure  communicate. 

Lemma  3.1  The  following  all  hold  for  each  tiecution  of  procedure  communicate  by  processor 
j  with  the  message  (.U); 

1.  if  I  is  connected  to  at  least  a  tnajorihj  of  the  processors  then  the  erecution  termmates, 

2.  at  least  processors  receive  (M)  and  return  the  corresponding  acknowledgement, 

3.  at  most  2’i  messages  ore  sent  during  this  execution, 

4-  the  procedure  terminates  after  at  most  two  time  units,  and 

5.  the  sice  of  i ’s  local  memory  is  C(n)  times  the  size  of  the  acknowledgements  to  (M). 


4  The  unbounded  implementation  -  complete  network 

Informally,  in  order  to  write  a  new  value,  the  writer  executes  communicate  to  send  its  new 
value  to  a  majority  of  the  processors.  It  com  nicies  the  write  operation  only  after  receiving 
acknowledgements  from  a  majority  of  the  processors.  In  order  to  read  a  value,  the  reader 
sends  a  request  to  all  processors  and  gets  in  return  the  latest  values  known  to  a  majority  of  the 
processors  (using  communicate).  Then  it  adopts  (returns)  the  maximal  among  them.  Before 
finishing  the  read  0])cration,  the  reader  announces  the  value  it  intends  to  adopt  to  at  least  a 
majority  of  the  processors  (again  by  using  communicate). 

The  writer  appends  a  label  to  every  new  value  it  writes.  In  the  unbounded  implementation 
this  is  an  integer.  For  simplicity,  we  ignore  the  value  itself  and  identify  it  with  the  label. 

Processor  i  stores  in  its  local  memory  a  variable  ea/,,  holding  the  most  recent  value  of 
the  register  known  to  i.  This  value  may  be  acquired  either  during  i’s  read  operations,  from 
messages  sent  during  other  processors’  read  operations,  or  directly  from  the  writer.  In  addition, 
i  holds  a  vector  of  lengtli  n  of  the  most  recent  values  of  the  register  sent  to  i  by  other  processors. 
Let  V  denote  the  number  of  bits  needed  to  represent  any  value  from  the  domain  of  all  possible 
values,  we  liave 

Proposition  4.1  The  sice  of  the  local  memory  at  each  processor  is  O(rtV). 
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In  the  implementation,  there  are  three  procedures:  read  for  the  reader,  write  for  the  writer, 
and  ack,  used  by  all  processors  to  respond  to  messages.  These  procedures  utilize  six  types  of 
messages,  arranged  in  three  padrs,  each  consisting  of  a  message  and  a  corresponding  acknowl¬ 
edgement. 

1.  The  pair  of  write  messages. 

(  W ,  val):  sent  by  the  writer  in  order  to  write  val  in  its  register. 

{ACK-W')'.  the  corresponding  acknowledgement. 

2.  The  first  pair  of  read  messages. 

{Ri)'.  sent  by  the  reader  to  request  the  recent  value  of  the  writer. 

{val):  the  corresponding  acknowledgement,  contains  the  sender’s  most  updated  value  of 
the  register. 

3.  The  second  pair  of  read  messages. 

{Rg,  vcl):  sent  by  the  reader  before  terminating  in  order  to  announce  that  it  is  going  to 
return  val  as  the  value  of  the  register. 

{ACK-Rt):  the  corresponding  acknowledgement. 

Clearly,  we  have 

Proposition  4.2  The  maximum  size  of  a  message  is  0{V). 

The  descriptions  of  procedures  write,  read  and  ack  appear  in  Figure  2.  Procedure  ack 
instructs  each  processor  what  to  do  upon  receiving  a  message  according  to  the  template  in 
Figure  1  (as  explained  in  Section  3).  We  use  void  to  say  that  the  information  sent  with  the 
acknowledgements  to  a  particular  message  is  ignored.  Since  communication  is  done  only  by 
communicate.  Lemma  3.1  (part  1)  implies 

Lemma  4.3  Each  execution  of  a  read  operation  or  a  write  operation  terminates. 

The  value  contained  in  the  first  wjj  •*  nicrsa;;*;  and  the  second  read  message  is  called  the 
value  communicated  by  the  communicaP  dure  execution.  The  maximum  value  among 

the  values  contained  in  the  acknowledgements  of  the  first  read  message  is  called  the  value 
acknowledged  by  the  communicated  procedure  execution.  The  following  lemma  deals  with  the 
ordering  of  these  values,  and  is  the  crux  of  the  correctness  proof. 
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Procedure  read,(va/,);  (*  executed  by  processor  »  and  returns  vali  *) 
communicate((iJ/),  info)-, 
vali  ■■=  mttXi<j<n{info{j)  1  info{j) 
communicate({/J* ,  tia/,),  toid); 
end  procedure  read,-  ; 


Procedure  writeu;;  (*  for  the  writer  w  *) 

val,u  ■=  •'“fw  +  1;  (*  the  new  value  of  the  register  *) 
communicatt((  W ,  val,u),  void)-, 
end  procedure  writCu,; 


Procedure  ackj ;  (•  executed  by  processor  j  *) 
case  received  from  w 

{W,val^).  valj  :=  inaxfra/u,,  »a/^}  ; 
send  (ACK-W)  to  w, 
case  received  from  i 

(Ri).  send  {valj)  to  i; 
(R2,val,):  vo/j  ;=  max{i;o/i,  vo/;}  ; 
send  {ACK~Rs)  to  i; 

end  procedure  ackj ; 


Figure  2:  The  read,  write  and  ack  ptxxxdujxs  of  the  unbounded  emulator 
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Lemma  4.4  Assume  a  communicate  procedure  execution  C\  communicated  x,  and  a  commu¬ 
nicate  procedure  execution  C2  acknowledged  y.  Assume  that  Ci  has  completed  before  Cj  has 
started.  Then  x  <  y. 

Proof;  By  Lemma  3.1  (part  2)  and  the  code  for  ack,  when  C\  is  completed  at  least  majority 
of  the  processors  store  x',  such  that  x'  >  x.  Similarly,  by  Lemma  3.1  (part  2),  in  C2  acknowl¬ 
edgements  were  received  from  at  least  a  majority  of  the  processors.  Thus,  there  must  be  at 
least  one  processor  that  stored  a  value  x'  >  x  and  acknowledged  in  €2.  Since  y  is  maximal 
among  che  values  contained  in  the  acknowledgements  of  C2,  it  follows  that  y  >  x'  >  x.  ■ 

Since  a  write  operation  completes  only  after  its  communicate  procedure  completes,  Lemma  4. 1 
implies 

Lemma  4.5  Assume  a  read  operation,  TZ,  returns  the  value  y.  Then  y  is  cither  the  value  of 
the  last  write  operation  that  was  completed  before  71  started  or  it  is  the  value  of  a  concurrent 
write  operation. 

In  a  similar  manner,  since  a  read  operation  completes  only  after  its  second  execution  of 
communicate  is  completed,  Lemma  4.4  implies 

Lemma  4.6  Assume  some  read  operation,  TZi,  returns  the  value  x,  and  that  another  read 
operation,  TJj,  that  started  after  7l\  completed,  returns  y.  Then  x  <  y. 

Since  processors  communicate  only  by  using  the  communicate  procedure.  Lemma  3.1  (parts 
3  and  4)  implies  the  following  complexity  propositions. 

Proposition  4.7  At  most  4n  messages  are  sent  during  each  execution  of  a  read  operation.  At 
most  2n  messages  arc  sent  during  each  execution  of  a  write  operation. 

Proposition  4.8  Each  execution  of  a  read  operation  takes  at  most  4  time  units.  Each  execu¬ 
tion  of  a  write  operation  takes  at  most  2  time  un»t.i. 

The  next  theorem  summarizes  the  above  discussion. 

Theorem  4.9  There  exists  an  unbounded  emulator  of  an  atomic,  single-writer  multi-reader 
register  in  a  eomj‘'r.le  network,  in  the  presence  of  at  most  processor  failures.  Each 

execution  of  a  read  operation  or  a  write  operation  requires  0(?i)  messages  and  0(1)  time. 
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5  The  bounded  implementation  -  complete  network 

5.1  Informal  Description 

The  only  source  of  unboundedness  in  the  above  emulation  is  the  integer  labels  utilized  by  the 
writer.  In  order  to  eliminate  this,  we  use  an  idea  w»hich  was  employed  previously  in  [31,  14]. 
Tiie  integer  labels  are  replaced  by  bounded  sequential  time-stamp  system  ([31]),  which  is  a 
fiiiiic  domain  £  of  label  values  together  with  a  total  order  relation  -<.  Whenever  the  writer 
needs  a  new  label  it  produces  a  new  one,  larger  (with  respect  to  the  -<  order)  than  all  the 
labels  that  exist  in  the  system.  Thus,  instead  of  just  adding  one  to  the  label,  a.s  in  the 
unbounded  emulation,  here  the  writer  invokes  a  special  procedure  called  LABEL.  The  input 
for  this  procedure  is  a  set  of  labels  and  the  output  is  a  new  label  which  is  greater  than  all  the 
labels  in  this  set.  'I'liis  can  be  achieved  by  the  constructions  presented  in  [31,  23]  for  bounded 
.sc(iuential  liine-starn])  systems. 

J’he  main  difTiculty  in  carrying  this  idea  over  to  the  message-passing  model  is  in  maintaining 
the  set  of  labels  existing  in  the  system,  a  task  which  need  not  be  addressed  in  the  shared- 
memory  model  (cf.  [31,  33]).  Notice  that  in  order  to  assure  correctness,  it  suffices  to  guarantee 
that  the  set  of  labels  that  exist  in  the  system  is  contained  in  the  input  set  of  labels  of  procedure 
LABEL.  The  key  idea  is  as  follows. 

Whenever  a  processor  adopts  a  label  (as  the  maximum  value  of  the  writer  it  knows  about), 
it  records  this  fact  in  the  system.  This  is  done  by  broadcasting  an  appropriate  message  and 
waiting  for  ackuowlodgements  from  a  majority  of  the  processors  (using  communicate).  Upon 
receiving  a  recording  message,  a  processor  stores  the  information  it  contains  in  its  local  memory, 
but  ignores  the  values  it  carries.  This  process  guarantees  that  labels  do  not  get  lost  as  a 
in.'ijority  of  the  processors  have  recorded  them. 

To  avoid  inconsistencies  that  might  occur,  a  processor  blocks  all  computation  that  is  re¬ 
lated  to  nesv  labels  during  the  recording  process.  It  docs  not  adopt  iiev/  labels  and  does  not 
send  nonrecording  messages  containing  new  labels.  An  independent  ping-pong  mechanism  is 
employed  for  each  type  of  messages,  e.g.,  i  may  send  a  recording  message  to  j  although  j  did 
not  acknowledge  a  read  message  of  i.  Since  recording  messages  do  not  cause  a  processor  to 
a(lo|)f  a  label,  d^^adlock  is  avoided. 

5.2  Dat.n  Structures  and  Messages 

To  iuiplomcnl  the  recording  process,  each  processor  i  maintains  an  n  x  n  matrix  Li  of  labels. 
3  he  itli  low  vector  L,{i)  is  updated  dynamically  by  t  according  to  messages  i  sends.  The  jth 


row  vector  L,{j)  is  updated  by  the  messages  t  receives  from  j  during  a  recording  process  initi¬ 
ated  by  j.  Each  entry,  Li{i,k),  is  composed  of  two  fields:  sent  and  ack.  The  field  Lt{i,  k).sent 
contains  the  last  label  i  sent  to  k  and  the  field  L,{i,  k).ack  is  the  last  label  i  sent  to  k  as  an 
acknowledgement  to  a  read  request  of  k.  In  particular,  L,{i,  i)  is  the  current  maximum  label  of 
the  writer  known  to  i.  The  writer  starts  each  write  operation  by  obtaining  from  a  majority  of 
the  processors  their  most  updated  values  for  the  matrix  L  (using  communicate).  The  union  of 
the  labels  that  appear  in  its  own  matrix  and  these  matrices  is  the  input  to  procedure  LABEL. 

Procedures  read  and  write  use  five  pairs  of  messages  and  corresponding  acknowledgements. 

1.  The  first  pair  of  write  messages. 

( IT; );  sent  by  the  writer  ar  the  beginning  of  its  operation  in  order  to  collect  information 
about  existing  labels. 

{L):  the  corresponding  acknowledgement,  L  is  the  sender’s  updated  value  of  the  labels’ 
matrix. 

2.  The  second  pair  of  write  messages,  {\Ve,val)  and  {ACK-W^),  the  first  pair  of  read  mes¬ 
sages,  {Ri)  and  (va/),  and  the  second  pair  of  read  messages,  (Rt^val)  and  {ACK-Ri), 
are  the  same  as  the  corresponding  messages  in  the  unbounded  algorithm. 

3.  The  pair  of  recording  messages. 

{REC,  L{i)):  before  adopting  any  new  value  for  the  register,  processor  t  sends  L,(i)  to 
other  processors.  The  vector  Li{t)  contains  this  new  value  and  all  the  recent  values 
that  t  sent  on  its  links  to  other  processors. 

{ACI{-REC)i  the  corresponding  acknowledgement. 

The  longest  message  is  {L),  denote  V  =  log|£|.  we  have, 

Proposition  6.1  The  maximum  size  of  a  message  is  0{ti^  ■  V). 

Recall  that  during  the  recording  process,  processors  do  not  reply  to  nonrecording  messages. 
Therefore,  messages  are  accumulated  in  the  local  memory  of  the  processor  and  are  ordered  in 
a  queue.  As  soon  2is  the  recording  process  ends,  the  processor  first  handles  the  messages  on  the 
queue. ^  Due  to  the  ping-pong  mechanism  the  length  of  this  queue  is  at  most  0{n).  As  each 
message  on  the  queue  contains  (at  most)  a  vector  of  n  labels  and  the  matrix  L,  is  0(n^  •  V), 
we  have, 

^The  details  of  how  this  queue  is  handled  are  omitted. 
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Proposition  5.2  The  size  of  the  local  memory  of  a  reader  is  0(n^  •  V).  The  size  of  the  local 
memory  of  a  writer  is  0{n^  •  V). 

5.3  The  Algorithm 

The  pseudo-code  for  the  algorithm  appears  in  Figure  3.  Procedure  update  and  the  first  part 
of  procedure  recording  update  dynamically  the  vector  Li(i).  Therefore,  in  procedure  read,  it 
is  enough  to  take  val,  aS  Lt(i,i).  The  flag  blocked  is  set  to  true  during  the  recording  process 
and  prevents  the  processor  from  receiving  or  sending  some  messages  as  described  in  procedure 
ack.  As  mentioned  before,  in  order  to  prevent  deadlocks  a  separate  ping-pong  mechanism  is 
employed  for  each  type  of  message.  In  order  to  distinguish  between  the  different  mechanisms, 
calls  to  communicate  are  subscripted  with  the  message  type. 

5.4  Correctness  and  Complexity 

Atomicity  of  the  bounded  emulator  follow  from  the  same  reasoning  as  in  the  unbounded  case 
(Lemma  4.5  and  Lemma  4.6).  The  following  lemma  is  the  core  of  the  correctness  proof  for  the 
bounded  emulator — it  assures  that  the  writer  always  obtain  a  superset  of  the  labels  that  might 
be  adopted  as  the  register’s  value  by  some  processor.  We  call  a  label  i  viable,  if  in  some  system 
state,  at  some  possible  extension  from  this  state,  for  some  processor  t,  va/,  =  x.  Intuitively, 
a  viable  label  is  held  by  some  processor  as  the  current  register’s  value  or  it  will  become  the 
current  register’s  vzduc  for  some  processor. 

Lemma  5.3  Each  viable  label  is  stored  either  in  the  writer  matrix  or  in  the  matrices  of  at 
least  a  majority  of  the  processors. 

Froof:  Wc  s?.y  that  processor  i  is  responsible  for  label  x,  if  x  is  stored  in  Z,i(t),  i.c.,  if  ci'.hor 
i)  =  X,  L,{i,j).sent  =  x  or  L,{i,j).ack  =  i.  We  first  claim  that  for  any  viable  label  there 
exists  a  processor  that  is  responsible  for  it.  Assume  that  x  is  a  label  that  is  held  by  i  as  tlm 
c’lrront  register’s  value,  then  by  the  code  of  the  algorithm  L,(t,t)  =  x  and  by  definition  i  is 
responsible  for  x.  Assume  x  will  become  the  current  register’s  value  for  processor  j  in  the 
future,  then  it  must  be  that  some  piocessor  i  has  sent  it  to  j  (either  by  (14'2)  mc.ssages  of 
i  or  in  response  to  an  Ri  request  message  by  j)  thus  x  €  Li{i,j). 

Now  assume  that  i  is  responsible  for  x.  Look  at  a  simple  path  on  which  the  label  i  has 
arrived  at  i,  i.e.,  a  sequence  toiM,  •  •  .,»m,  where  io  is  the  writer  and  im  =  i.  In  this  sequence, 
ff)r  any  1  <  ^  <  m,  processor  i(  adopted  x  as  a  result  of  a  message  from  i/_i. 
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Procedure  readi(iia/,)  ;  (*  executed  by  processor  i  and  returns  vali  *) 
communicatefl((fl; ) ,  in/o)  ; 
vali  Li{i,i)  ; 

communicat«H((^j ,  ta/,),  vend)  ; 
end  procedure  read,  ; 

Procedure  writer ;  (*  for  the  writer  w  *) 
communicate IV ({  Wj ),  L)  ; 

L^{w,w)  :=  LABEL(IJI)  ;  (*  all  the  non-empty  entries  in  L  *) 
communicate w  ({ ■  ^w(v-‘,  w)},  void)  ; 
end  procedure  writCu^; 

Procedure  lecordingj  ;  (*  executed  by  processor  i  *) 
upon  receiving  new  label  x  >  L,(i,i) 
blocked  :=  true  ; 

L,(i,  i)  r; 

communicate/i£c((/?£C,  L,{i)),  void)  ; 
blocked  false  ; 
end  procedure  recordingj  ; 

Procedure  update^  ;  (*  executed  by  processor  i  *) 

upon  sending  label  x  to  j  in  i’s  read  operation: 

Li{i,j).3ent  ;=  x  ; 

upon  sending  label  x  to  j  in  j’s  read  operation: 

Li{i,j).ack  :=  x  ; 
end  procedure  update^  ; 

Procedure  ack^ ;  (*  executed  by  processor  j  *) 
case  received  from  iv 

(kf'i):  send  (I;)  to  iv; 

{W2,  valu,)'-  if  va/u;  >  Lj(j,j)  then  wait  until  blocked  --  false  ; 
send  (ACK-IV2)  to  w  ; 
case  received  from  i 

(Hj):  wait  until  blocked  =  false  ; 

send  {L, (],]))  to  i; 

(fig,  vali)'.  if  vali  >  Lj(j,j)  then  wait  until  blocked  =  false  ; 

send  {ACK-Rs)  to  i; 

{REC.Lii,)):  L,{i)  -  L.{i) 

send  {ACK-REC)  to  i; 

end  procedure  ackj ; 

Figure  3:  The  read,  write,  recording,  update  and  ack  procedures  of  the  bounded  emulator. 
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The  claim  is  proved  by  induction  on  m,  the  length  of  this  path.  The  base  case,  m  =  0, 
occurs  when  i  is  the  writer.  Then  the  codes  of  procedures  update  and  write  imply  that  x 
is  stored  in  Ts  matrix.  For  the  induction  step,  assume  that  m  >  0,  and  that  the  induction 
hypothesis  holds  for  any  £,  0  <  £  <  m.  We  have  two  cases. 

1.  The  first  case  is  when  t  has  not  finished  the  recording  process  for  x.  It  follows  from  the 
code  of  procedure  recording  that  L,(i,  i)  =  x.  We  show  that  k  =  im-i  is  responsible  for 
I,  and  the  lemma  follows  from  the  induction  hypothesis. 

If  i  received  x  from  k  through  an  i?2  (Ht)  message,  then  since  i  is  blocked  during  tiie 
recording  process  it  would  not  reply  until  the  recording  process  of  x  is  done.  Conse¬ 
quently,  Lk{k,  i).sent  =  x. 

If  i  received  x  from  k  through  an  ACK-R\  message,  then  since  i  would  not  terminate  a 
read  operation  until  it  finishes  the  recording  process  of  i,  it  would  not  start  a  new  read 
operation.  Consequently,  Lk{k,i).ack  =  x. 

2.  The  second  case  is  when  i  has  finished  the  recording  process  for  x.  If  L,{i,  i)  =  i,  i.e.,  x  is 
still  the  current  value  that  i  holds,  then  the  code  for  procedure  record,  and  the  properties 
of  procedure  communicate  (Lemma  3.1,  part  2)  imply  that  x  is  stored  in  the  matrices  of 
at  least  a  majority  of  the  processors. 

If  L,{i,  i)  ^  I,  then  since  t  is  responsible  for  x  there  must  exist  a  j  such  that  x  € 
Furthermore,  since  :  has  a  more  recent  value  for  the  register  it  must  be  that  — 

y  >■  X.  By  the  code  for  procedure  recording  and  the  properties  of  procedure  communi.ate 
(Lemma  3.1),  at  the  end  of  the  recording  process  for  x,  x  is  stored  as  in  the 

matrices  of  at  least  a  majority  of  the  processors.  Let  k  be  some  processor  that  recorded 
x  for  i,  i.e.,  such  that  Lk{i,i)  =  x  at  the  end  of  the  recording  process  for  x. 

If  currently,  i|j(i,  i)  =  z  ^  z  then  it  must  he  that  x  ■<  z.  Since  forwarding  a  new  value 
is  blocked  during  the  recording  process,  it  must  be  that  x  was  sent  by  i  to  j  before  the 
recording  process  for  z  started.  Thus  x  6  L,{i,j)  during  the  recording  process  for  z, 
and  consequently  i  €  Lk{i,j]-  Therefore,  x  appears  in  the  matrices  of  a  majority  of  the 
processors. 


Lemma  5.3  and  the  constructions  of  bounded  sequential  time-stamp  systems  of  [31,  23] 
imply 

Corollary  5.4  The  new  label  generated  by  procedure  LABEL  is  greater  than  any  viable  label  in 
the  system. 
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Recording  messages  are  acknowledged  immediately  and  are  never  blocked.  Thus,  a  pro¬ 
cessor  iiever  deadlocks  during  a  recording  process  and  will  eventually  acknowledge  all  the 
messages  it  receives.  The  next  lemma  follows  since  during  a  read  or  a  write  operation,  at  most 
2n  recording  processes  could  occur. 

Lemma  6.5  Each  execution  of  a  read  operation  or  a  write  operation  terminates. 

Each  acknowledgement  the  reader  receives  might  cause  it  to  initiate  a  recording  process. 
By  Lemma  3.1,  part  3,  at  most  2n  messages  are  sent  during  each  of  these  recording  processes. 
In  addition,  each  message  of  type  Wj  or  iZj  might  cause  other  processors  to  initiate  a  recording 
piocess.  Thus,  at  most  O(n^)  messages  arc  sent  during  each  execution  of  am  operation,  and  it 
takes  at  most  0(1)  time  units.  Thus  we  have 

Proposition  5.6  At  most  O(n^)  messages  are  sent  during  each  execution  of  a  read  or  a  write 
operation. 

Proposition  5.7  Each  execution  of  a  read  or  write  operation  takes  at  most  6  time  units. 

The  constructions  of  bounded  sequential  time-stamp  system  ([31,  23])  imply  that  a  label 
can  be  represented  using  0{n)  bits.  The  next  theorem  summarizes  the  above  discussion. 

Theorem  5.8  There  exists  a  bounded  emulator  of  an  atomic,  single-writer  multi-reader  regis¬ 
ter  in  a  complete  network,  in  the  presence  of  at  most  processor  failures.  Each  execution 

of  a  read  operation  or  a  write  operation  requires  O(n^)  messages  each  of  size  0{n),  0(1)  time, 
and  0(n'‘)  local  memory. 


6  The  bounded  implementation  -  arbitrary  network 

In  an  arbitrary  network  a  processor  is  considered  faulty  if  it  cannot  communicate  with  a 
majority  of  the  processors,  and  a  correctly  functioning  processor  is  guaranteed  to  be  eventually 
in  the  same  connected  component  with  a  majority  of  the  processors.  The  first  construction 
in  this  section  is  achieved  by  replacing  every  send  operation  from  i  to  j  by  an  execution  of 
an  end-to-end  protocol  between  i  and  j.  Implementations  of  such  a  protocol  are  known  (see 
[5,  14,  6]).  An  end-to-end  protocol  establishes  traffic  between  i  and  j  if  there  is  eventually 
a  path  between  them.  In  our  case,  eventually  there  will  be  a  path  between  any  iionfauUy 


processor  and  a  majority  of  the  processors,  thus  the  system  behaves  as  in  the  case  of  complete 
network  with  processor  failures. 

Note  that  there  are  labels  in  the  system  that  will  not  appear  in  the  input  of  procedure 
LABEL.  However,  these  are  not  viable  labels  because  the  end-to-end  protocol  will  prevent 
processors  from  adopting  them  as  the  writer’s  label  and  hence  correctness  is  preserved. 

The  complexity  claims  in  the  next  theorem  are  implied  by  the  end-to-end  protocol  of  [6].® 

Theorem  6.1  Th  ere  exists  a  hounded  emulator  of  an  atomic,  single-writer  multi-reader  regis¬ 
ter  in  an  arbitrary  network  in  the  presence  of  link  failures  the  do  not  disconnect  a  majority  of 
the  processors.  Each  execution  of  a  read  operation  or  a  write  operation  requires  0(n®)  messages, 
each  of  size  0{  n),  and  0(  n^  )  time. 

Instead  of  implementing  each  virtual  link  separately  we  can  achieve  improved  performance 
by  implementing  communicate  directly.  We  make  use  of  the  fact  that  .'Vfek  and  Gafni  ([6]) 
show  how  to  resynchronize  any  diffusing  computation  ([21]),  not  only  an  end-to-end  protocol. 
.Although  the  task  achieved  by  communicate  is  not  exactly  a  diffusing  computation,  we  can 
modify  the  algorithm  of  [G],  by  "piggybacking"  acknowledgement  information.  The  resulting 
implementation  requires  0{u^)  messages  per  invocation  of  communicate.  Thus  we  have 

Theorem  6.2  There  exists  a  bounded  emulator  of  an  atomic,  single-writer  multi-reader  regis¬ 
ter  in  an  arbitrary  network  in  the  presence  of  link  failures  the  do  not  disconnect  a  majority  of 
the  processors.  Each  execution  of  a  read  operation  or  a  write  operation  requires  0{n‘*)  messages, 
each  of  size  0{n),  and  time. 


7  Discussion  and  further  research 


We  have  presented  emulators  of  atomic,  single- w  riter  multi-reader  registers  in  message-passing 
systems  (networks),  in  the  presence  of  processor  or  link  failures.  In  the  complete  network,  in 
the  presence  of  processor  failures,  each  operation  to  the  register  requires  O(n^)  messages,  each 
of  size  0{n),  and  c.uiistaiit  time.  In  an  arbitrary  network,  in  the  presence  of  link  failures,  each 
operation  to  the  register  requires  0(n‘*)  messages,  each  of  size  0{n),  and  0{n^)  time. 

It  is  interesting  to  improve  the  complexity  of  the  emulations,  in  either  of  the  message- 
passing  systems.  Alternatively,  it  might  be  possible  to  prove  lower  bounds  on  the  cost  of  such 
emulations. 

®.A.ny  improvement  in  the  complexity  of  the  end-to-end  protocol  will  immediately  result  in  an  improvement 
to  the  complexity  of  our  implementation. 


IS 


An  interesting  direction  is  to  emulate  stronger  shared  memory  primitives  in  message-passing 
systems  in  the  presence  of  failures.  Any  primitive  that  can  be  implemented  from  waJt-free, 
atomic,  single-writer  multi-reader  registers,  can  be  also  implemented  in  message-passing  sys¬ 
tems,  using  the  emulators  we  have  presented.  This  includes  wait-free,  atomic,  multi-writer 
multi-reader  registers,  atomic  snapshots,  and  many  others.  However,  there  are  shared  memory 
data-structures  that  cannot  be  implemented  from  wait-free,  atomic,  single-writer  multi-reader 
registers  ([30]).  Some  of  these  primitives,  such  as  Read-Modify- Write,  can  be  used  to  solve 
consensus  ([30]),  and  thus  any  emulation  of  them  in  the  presence  of  failures  will  imply  a  solu¬ 
tion  to  consensus  in  the  presence  of  failures.  It  is  known  ([29])  that  consensus  cannot  bo  solved 
in  asynchronous  systems  even  in  the  presence  of  one  failure.  Thus,  we  need  to  strengthen  the 
message-passing  model  in  order  to  emulate  primitive  such  as  Rcad-Modify-Write.  Additional 
power  can  be  added  to  the  mess.age-passing  model  considered  in  this  paper  by,  e.g.,  failure 
detection  mechanisms  or  automatic  acknowledgement  mechanisms  (cf.  [27]).  We  leave  all  of 
this  as  a  subject  for  future  work. 
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